COVID-2019 Brief Data Analysis

Introduction

Our domain of interest is the impact of COVID-19 in the US. This spring, we have this unfortunate outbreak of virus that has threathen hundreds of thoudsands of lives int the U.S., and even more around the world. Our group hopes that, with our effort and analysis on multiple datasets about the COVID-19, we can arouse more viligance and help people to understand this disease better. The first dataset is a comprehensive one about each U.S county, which collects information related to their weather, socio/health and COVID-19 situation. Since its size exceeds the upload limit, we decided truncate it. We also found a second dataset about provisional COVID-19 Death Counts based on states, sex and ages that could help us understand the bigger picture. Finally, the dataset US State COVID-19 Daily collects number of daily cases in USA, and we have a helper dataset that stores the latitude and longitude values of states and country to assist our charts.

Summary Infomation

We used the Provisional COVID-19 Death Counts dataset to help us understand the general situation the U.S. is in. We would like to look into the impact on different age group, sex, and geometric location. In general, COVID-19 has caused 771637 deaths in the U.S. so far. The place that has the most deaths for COVID-19 is New York City, with the number being 25190, which has surpassed many states. The age group that has the most people who died from COVID-19 in the U.S. is 85 years and over. For male, the total deaths of COVID-19 is 48815, and for female, the total deaths is 38301.

A Summary Table

Our summary table sumamrizes the total number of testings conducted in the U.S. It also demonstrate the number of positive and negative cases, along with the percentage of positive cases over the total number of testing in the states. The data is grouped by states so that we can compare the different situations among the states. The table is sorts descendently by the percentage of positive cases over all the testings in each state. From the table, we can conclude that New Jersy has the highest percentage for positive COVID-19 cases among all the states. We should notice that even though New York has a significantly higher number of testing conducted than New Jersy, the percentage for positive cases in NY is lower than the NJ. Another insight is some states have not conducted enough testing for COVID-19.

State Total Positive Cases Total Negative Cases Total Testings Percent of Positive Cases
NJ 140742 292317 433059 32.50
NY 338479 886580 1225059 27.63
CT 34333 104091 138424 24.80
DC 6485 24559 31044 20.89
DE 6741 26540 33281 20.25
MD 34061 135425 169486 20.10
PR 2294 9304 11598 19.78
MA 79324 322164 401488 19.76
PA 57989 237989 295978 19.59
CO 19879 88759 108638 18.30
NE 8572 39354 47926 17.89
IL 83017 388546 471563 17.60
IN 25126 125383 150509 16.69
VA 25800 129511 155311 16.61
IA 12912 68361 81273 15.89
MI 48012 259869 307881 15.59
SD 3663 21529 25192 14.54
LA 32050 195962 228012 14.06
GA 34633 227544 262177 13.21
KS 7116 46989 54105 13.15
RI 11613 83625 95238 12.19
OH 25250 192474 217724 11.60
MN 12494 108304 120798 10.34
MS 9908 87784 97692 10.14
NV 6310 57750 64060 9.85
AZ 11734 111079 122813 9.55
NH 3158 32391 35549 8.88
WI 10610 112729 123339 8.60
SC 7927 85208 93135 8.51
MO 10006 111290 121296 8.25
AL 10310 122908 133218 7.74
NC 15345 186898 202243 7.59
TX 39868 485828 525696 7.58
FL 41921 537657 579578 7.23
ID 2260 30418 32678 6.92
WA 17121 234986 252107 6.79
CA 69329 963526 1032855 6.71
KY 6677 97340 104017 6.42
ME 1477 22091 23568 6.27
AR 4164 66274 70438 5.91
VI 68 1115 1183 5.75
TN 16110 267713 283823 5.68
OK 4731 91379 96110 4.92
NM 5069 101636 106705 4.75
WY 675 14384 15059 4.48
VT 927 20327 21254 4.36
OR 3283 74291 77574 4.23
UT 6431 147053 153484 4.19
GU 149 3916 4065 3.67
ND 1571 46261 47832 3.28
WV 1371 63697 65068 2.11
MT 461 22563 23024 2.00
HI 633 37305 37938 1.67
AK 383 29570 29953 1.28
MP 19 2854 2873 0.66
AS 0 105 105 0.00

Charts

First Chart

Our first chart is a line chart that reflects the daily Covid-19 death rate from this January to mid-May. We include this chart becasue we would like to see how the trend of the death rate changes over time. According to the graph, the death rate spiked up at the end of February and kept increasing until the end of March. Then steadiley decreased for about a month before it came back up again, and now the trend have been relatively constant for about a month.

Second Chart

Our second chart is a map that uses circle to locate each state, and shows the positive cases of COVID-19 with the size of the circle. We included this chart becasue we would appreciate a map that can shows the geometric location and visualize the postive cases number at the same time. From this map, we can tell that the Northeast ara suffers the most, and California has quite a few cases compared to other west states. According to common knowledge, we see that the more populated is the state, the more cases it has, which is also shown by the fact that the Midwest generally has fewer cases.

Third Chart

Our third chart is a scatter plot that shows the percent of smoking population and Covid-19 death rate of each state in the U.S. We include this chart because we would like to see if there is an associate between smoking and covid-19 death, since smoking damages people’s lungs, and the impact of covid-19 is primarily on lungs as well. According to our plot, there is no significant correlation between smoking proportion and covid-19 death rate.